SemanticScuttle - klotz.me » Tags: llm+data science

Tags: llm* + data science*

0 bookmark(s) - Sort by: Date ↓ / Title /

Using Google’s LangExtract and Gemma for Structured Data Extraction

Extracting structured information effectively and accurately from long unstructured text with LangExtract and LLMs. This article explores Google’s LangExtract framework and its open-source LLM, Gemma 3, demonstrating how to parse an insurance policy to surface details like exclusions.

2025-08-27 Tags: data science, large language models, llm, machine learning, structured data, langextract, gemma, data extraction by klotz

Exploring NotebookLM Alternatives

This article explores alternatives to NotebookLM, a Google assistant for synthesizing information from documents. It details NousWise, ElevenLabs, NoteGPT, Notion, Evernote, and Obsidian, outlining their key features, limitations, and considerations for choosing the right tool.

2025-08-06 Tags: notebooklm, llm, alternatives, nouswise, elevenlabs, notegpt, notion, evernote, obsidian, data science, machine learning, productivity by klotz

Your Personal Analytics Toolbox

Leveraging MCP for automating your daily routine. This article explores the Model Context Protocol (MCP) and demonstrates how to build a toolkit for analysts using it, including creating a local MCP server with useful tools and integrating it with AI tools like Claude Desktop.

2025-07-08 Tags: mcp, model context protocol, agents, analytics, automation, python, hugging face, gradio, llm, data science by klotz

LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries

Local Large Language Models can convert massive DataFrames to presentable Markdown reports — here's how.

2025-06-03 Tags: data science, generative ai, llm, pandas, python by klotz

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

This article details how to accelerate deep learning and LLM inference using Apache Spark, focusing on distributed inference strategies. It covers basic deployment with `predict_batch_udf`, advanced deployment with inference servers like NVIDIA Triton and vLLM, and deployment on cloud platforms like Databricks and Dataproc. It also provides guidance on resource management and configuration for optimal performance.

2025-05-09 Tags: data science, deep learning, llm, apache spark, nvidia, rapids, triton, vllm, databricks, dataproc, mlops by klotz

NVIDIA DGX Spark

NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.

2025-03-24 Tags: machine learning, nvidia, dgx spark, llm, grace blackwell, ai development, inference, data science, gpu, cpu by klotz

Semantic Telemetry: Understanding how users interact with AI systems

This blog post introduces the Semantic Telemetry project at Microsoft Research, which uses a data science approach to analyze how people interact with AI systems, specifically focusing on Copilot in Bing usage. It discusses the complexity of human-AI interactions and how they differ from traditional search.

- Topics: Copilot in Bing chats were analyzed for topic categorization. Technology (21%) was the most common topic, followed by Entertainment (12.8%), Health (11%), and others. Within technology, programming and scripting were prominent subtopics.
- Platform Differences: Mobile users tend to use Copilot for personal tasks, while desktop users engage in more professional activities.

2025-03-10 Tags: semantic telemetry, llm, microsoft research, copilot, bing, hci, weiwei yang, data science by klotz

Tutorial: Semantic Clustering of User Messages with LLM Prompts

This tutorial demonstrates how to perform semantic clustering of user messages using Large Language Models (LLMs) by prompting them to analyze publicly available Discord messages. It covers methods for data extraction, sentiment scoring, KNN clustering, and visualization, emphasizing faster and less effort-intensive processes compared to traditional data science approaches.

2025-02-18 Tags: semantic clustering, llm, knn, sentiment analysis, data science, vectordb, discord, solon by klotz

The Big Book of Large Language Models

A comprehensive guide to Large Language Models by Damien Benveniste, covering various aspects from transformer architectures to deploying LLMs.

- Language Models Before Transformers
- Attention Is All You Need: The Original Transformer Architecture
- A More Modern Approach To The Transformer Architecture
- Multi-modal Large Language Models
- Transformers Beyond Language Models
- Non-Transformer Language Models
- How LLMs Generate Text
- From Words To Tokens
- Training LLMs to Follow Instructions
- Scaling Model Training
- Fine-Tuning LLMs
- Deploying LLMs

2025-02-11 Tags: llm, damien benveniste, machine learning, data science, book by klotz

Accurate predictions on small data with a tabular foundation model | Nature

- TabPFN is a novel foundation model designed for small- to medium-sized tabular datasets, with up to 10,000 samples and 500 features.
- It uses a transformer-based architecture and in-context learning (ICL) to outperform traditional gradient-boosted decision trees on these datasets.

2025-01-30 Tags: tabofn, llm, transformer, tabular data, prediction, imputation, data science, nature by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: llm* + data science*

Linked Tags

Related Tags